feat: sglang guided decoding support by jellysnack · Pull Request #6620 · ai-dynamo/dynamo

jellysnack · 2026-02-26T10:01:23Z

Overview:

Add guided decoding support for the SGLang engine and replace skip_tokenizer_init with use_sglang_tokenizer for tokenizer selection.

Details:

SGLang's GrammarManager disables grammar_backend when skip_tokenizer_init=True (ref), which makes guided decoding impossible. To fix this:

Removed the skip_tokenizer_init=True override. SGLang still supports both token-based and text-based inputs without it.
Replaced all skip_tokenizer_init references with use_sglang_tokenizer throughout handlers and registration logic.
Added _get_guided_decoding_params() helper in BaseWorkerHandler to extract json_schema from guided_decoding request params and forward them to SGLang's sampling params.
Wired guided decoding params into DecodeWorkerHandler and PrefillWorkerHandler.

Where should the reviewer start?

components/src/dynamo/sglang/request_handlers/handler_base.py – new _get_guided_decoding_params() method
components/src/dynamo/sglang/args.py – removal of skip_tokenizer_init
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py – guided decoding integration and skip_tokenizer_init to use_sglang_tokenizer switch

Reproducer

Start the backend:

python -m dynamo.sglang --discovery-backend file --model-path Qwen/Qwen3-0.6B
python3 -m dynamo.frontend --discovery-backend file --http-port 8000

Send a request with JSON schema:

from openai import OpenAI

schema = {
    "name": "schema",
    "schema": {
        "type": "object",
        "properties": {
            "response": {
                "type": "string"
            }
        }
    }
}

client = OpenAI(base_url="http://localhost:8000/v1", api_key="foo")
response = client.chat.completions.create(
    model="Qwen/Qwen3-0.6B",
    messages=[{"role": "user", "content": "Hi!"}],
    temperature=0,
    response_format={"type": "json_schema", "json_schema": schema},
    extra_body={"chat_template_args": {"enable_thinking": False}},
)
print(response.choices[0].message.content)

Before fix:

Hello! How can I assist you today?

After fix:

{ "response": "Hello! How can I assist you today?" }

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Refactor
- Simplified internal tokenizer initialization configuration and improved guided decoding parameter handling across request processing pipeline.

Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>

copy-pr-bot · 2026-02-26T10:01:26Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-02-26T10:01:32Z

👋 Hi jellysnack! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2026-02-26T10:06:34Z

Walkthrough

This change refactors tokenizer flag handling by replacing skip_tokenizer_init with a new use_sglang_tokenizer flag across sglang handler files. Additionally, a new _get_guided_decoding_params helper method is introduced to extract JSON schema parameters from guided decoding configurations.

Changes

Cohort / File(s)	Summary
Tokenizer Flag Migration `components/src/dynamo/sglang/args.py`, `components/src/dynamo/sglang/register.py`	Removed direct assignment of `skip_tokenizer_init` flag; updated condition checks to use `use_sglang_tokenizer` instead, with simplified logging messages reflecting the tokenizer choice without state mutation.
Handler Core Updates `components/src/dynamo/sglang/request_handlers/handler_base.py`	Added new static method `_get_guided_decoding_params()` to extract JSON schema from guided decoding dictionaries; replaced `skip_tokenizer_init` with `use_sglang_tokenizer` flag for tokenizer initialization control in `InputParamManager`.
Handler Implementations `components/src/dynamo/sglang/request_handlers/llm/decode_handler.py`, `components/src/dynamo/sglang/request_handlers/llm/diffusion_handler.py`, `components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py`	Replaced conditional branching from `skip_tokenizer_init` to `use_sglang_tokenizer` for stream processor selection; integrated guided decoding parameters into sampling configurations by merging `_get_guided_decoding_params()` results.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A tokenizer flag takes flight,
From skip to use—a clearer sight!
Guided decoding joins the show,
Through handlers six, the refactors flow! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main feature addition: guided decoding support for SGLang, which aligns with the core objective of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The PR description includes all required sections: Overview, Details, Where should the reviewer start, and Related Issues.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/src/dynamo/sglang/request_handlers/handler_base.py`:
- Around line 391-400: The helper _get_guided_decoding_params currently only
reads guided_decoding["json"] and will ignore requests using
guided_decoding["json_schema"]; update _get_guided_decoding_params to accept
both keys (prefer "json_schema" if present, fall back to "json") and return
{"json_schema": json.dumps(value)} when either is supplied so schema constraints
are preserved; ensure the existing type check on guided_decoding
(isinstance(..., dict)) remains and return {} when neither key exists.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db88c95 and 7390085.

📒 Files selected for processing (6)

components/src/dynamo/sglang/args.py
components/src/dynamo/sglang/register.py
components/src/dynamo/sglang/request_handlers/handler_base.py
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
components/src/dynamo/sglang/request_handlers/llm/diffusion_handler.py
components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py

components/src/dynamo/sglang/request_handlers/handler_base.py

ishandhanani · 2026-03-16T03:59:50Z

Hey @jellysnack - thank you for the PR. Sorry have been overall very busy. This PR is on my TODO list to review this week

Signed-off-by: jellysnack <158609015+jellysnack@users.noreply.github.com>

jellysnack · 2026-03-24T14:06:17Z

just a gentle ping on this PR

jellysnack · 2026-04-06T18:16:24Z

@ishandhanani Could you take a look when you get a chance? Thanks!

rmccorm4 · 2026-04-07T21:06:41Z

/ok to test a18b513

ayushag-nv · 2026-04-07T21:22:48Z

@jellysnack Thanks for contributing. Gentle reminder to add a reproducer and output before and after fix.

dmitry-tokarev-nv · 2026-04-07T23:45:37Z

sorry for jumping in. Need to refresh this PR to pull a fix for Allure reporting.

dmitry-tokarev-nv · 2026-04-07T23:45:55Z

/ok to test ac19772

jellysnack · 2026-04-08T11:38:51Z

@jellysnack Thanks for contributing. Gentle reminder to add a reproducer and output before and after fix.

Done

ayushag-nv

Looks good to me !

jellysnack added 2 commits February 26, 2026 12:52

sglang guided decoding support

1757e93

Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>

sglang guided decoding support

7390085

Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>

jellysnack requested review from a team as code owners February 26, 2026 10:01

pull-request-size bot added the size/M label Feb 26, 2026

github-actions bot added feat external-contribution Pull request is from an external contributor backend::sglang Relates to the sglang backend labels Feb 26, 2026

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

components/src/dynamo/sglang/request_handlers/handler_base.py Show resolved Hide resolved

jellysnack added 2 commits March 1, 2026 23:06

Merge branch 'main' into feat/sglang-guided-decoding-support

79743b3

Merge branch 'main' into feat/sglang-guided-decoding-support

55919ae

jellysnack mentioned this pull request Mar 4, 2026

[CONTRIBUTION]: Add guided decoding support for SGLang engine #6871

Closed

vladnosiv mentioned this pull request Mar 5, 2026

[Bug] Streaming token ids data loss under load (affects Nvidia Dynamo) sgl-project/sglang#19976

Closed

jellysnack and others added 3 commits March 16, 2026 11:57

Merge branch 'main' into feat/sglang-guided-decoding-support

02266e7

Signed-off-by: jellysnack <158609015+jellysnack@users.noreply.github.com>

Merge branch 'main' into feat/sglang-guided-decoding-support

df0a52f

Merge branch 'main' into feat/sglang-guided-decoding-support

fb00bbd

vladnosiv mentioned this pull request Apr 7, 2026

fix: tool_choice=required bypasses format-specific parsers (#6821) #7589

Merged

Merge branch 'main' into feat/sglang-guided-decoding-support

a18b513

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2026 21:06 Inactive

rmccorm4 requested a review from KrishnanPrash April 7, 2026 21:07

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2026 23:07 Inactive

Merge branch 'main' into feat/sglang-guided-decoding-support

ac19772

copy-pr-bot bot temporarily deployed to GITLAB April 7, 2026 23:46 Inactive

copy-pr-bot bot temporarily deployed to GITLAB April 8, 2026 02:10 Inactive

ayushag-nv approved these changes Apr 8, 2026

View reviewed changes

ayushag-nv merged commit cf79c4f into ai-dynamo:main Apr 8, 2026
66 checks passed

Conversation

jellysnack commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Reproducer

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 26, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ishandhanani commented Mar 16, 2026

Uh oh!

jellysnack commented Mar 24, 2026

Uh oh!

jellysnack commented Apr 6, 2026

Uh oh!

rmccorm4 commented Apr 7, 2026

Uh oh!

ayushag-nv commented Apr 7, 2026

Uh oh!

dmitry-tokarev-nv commented Apr 7, 2026

Uh oh!

dmitry-tokarev-nv commented Apr 7, 2026

Uh oh!

jellysnack commented Apr 8, 2026

Uh oh!

ayushag-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jellysnack commented Feb 26, 2026 •

edited

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading